Rule base combined linguistics knowledge with corpus

نویسندگان

Ying Liu

Chengqing Zong

چکیده

This paper proposes a new approach to construction of rule bases for the transferredbased machine translation. In our approach, the rule bases are constructed in combination of the linguistics knowledge and large scale of corpora. On the one hand the lexical knowledge, the syntactic knowledge and the semantic knowledge are all used in the rules. on the other hand the knowledge is used for the statistics and self-learning of rules. In each rule base, all rules are scored and ranked. Thus an impersonal choice for the sentence can be made. The preliminary experimental results show that the approach may increase the speed to build the rule base and improve the quality of rules.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Approach to Example-Based Machine Translator using Translation Memory

This paper presents example-based machine translation architecture using translation memory that integrates the use of examples for flexible, idiomatic translations with the use of linguistic rules for broad coverage and grammatical accuracy. In examplebased machine translation (EBMT) approach to machine translation is often characterized by its use of a bilingual corpus with parallel texts as ...

متن کامل

A Corpus � Based Approach to Language Learning Eric Brill

A CORPUS BASED APPROACH TO LANGUAGE LEARNING Eric Brill Supervisor Mitchell Marcus One goal of computational linguistics is to discover a method for assigning a rich struc tural annotation to sentences that are presented as simple linear strings of words meaning can be much more readily extracted from a structurally annotated sentence than from a sentence with no structural information Also str...

متن کامل

The Multi-layer Language Knowledge Base of Chinese NLP

This paper introduced the effort to build a multi-layer knowledge base of Chinese NLP which combined with list-based, rule-based and corpus-based language information. Different kinds of information are designed to solve different kind of problems that encountered in the Chinese NLP. The whole knowledge base is designed with theoretical consistency and can easily be put into practice in the app...

متن کامل

Exploiting Wikipedia as a Knowledge Base for the Extraction of Linguistic Resources: Application on Arabic-French Comparable Corpora and Bilingual Lexicons

We present simple and effective methods for extracting comparable corpora and bilingual lexicons from Wikipedia. We shall exploit the large scale and the structure of Wikipedia articles to extract two resources that will be very useful for natural language applications. We build a comparable corpus from Wikipedia using categories as topic restrictions and we extract bilingual lexicons from inte...

متن کامل

Using Corpus Statistics and WordNet Relations for Sense Identification

Corpus-based approaches to word sense identification have flexibility and generality but suffer from a knowledge acquisition bottleneck. We show how knowledge-based techniques can be used to open the bottleneck by automatically locating training corpora. We describe a statistical classifier that combines topical context with local cues to ident~y a word sense. The classifier is used to disambig...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Rule base combined linguistics knowledge with corpus

نویسندگان

چکیده

منابع مشابه

An Approach to Example-Based Machine Translator using Translation Memory

A Corpus � Based Approach to Language Learning Eric Brill

The Multi-layer Language Knowledge Base of Chinese NLP

Exploiting Wikipedia as a Knowledge Base for the Extraction of Linguistic Resources: Application on Arabic-French Comparable Corpora and Bilingual Lexicons

Using Corpus Statistics and WordNet Relations for Sense Identification

عنوان ژورنال:

اشتراک گذاری